Capacity Definitions for General Channels with 
Receiver Side Information 



Michelle Effros, Senior Member, IEEE, Andrea Goldsmith, Fellow, IEEE, 
and Yifan Liang, Student Member, IEEE 



Abstract 

We consider three capacity definitions for general channels with channel side information at the 
receiver, where the channel is modeled as a sequence of finite dimensional conditional distributions 
not necessarily stationary, ergodic, or information stable. The Shannon capacity is the highest rate 
asymptotically achievable with arbitrarily small error probabiUty. The capacity versus outage is the 
highest rate asymptotically achievable with a given probability of decoder-recognized outage. The 
expected capacity is the highest average rate asymptotically achievable with a single encoder and 
multiple decoders, where the channel side information determines the decoder in use. As a special 
case of channel codes for expected rate, the code for capacity versus outage has two decoders: one 
operates in the non-outage states and decodes all transmitted information, and the other operates in the 
outage states and decodes nothing. Expected capacity equals Shannon capacity for channels governed 
by a stationary ergodic random process but is typically greater for general channels. These alternative 
capacity definitions essentially relax the constraint that all transmitted information must be decoded at 
the receiver. We derive capacity theorems for these capacity definitions through information density. 
Numerical examples are provided to demonstrate their connections and differences. We also discuss the 
impUcation of these alternative capacity definitions for end-to-end distortion, source-channel coding and 
separation. 
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Index Terms 
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Capacity Definitions for General Channels with 
Receiver Side Information 

I. Introduction 

Channel capacity has a natural operational definition: the highest rate at which information 
can be sent with arbitrarily low probability of error [1, p. 184]. Channel coding theorems, a 
fundamental subject of Shannon theory, focus on finding information theoretical definitions of 
channel capacity, i.e. expressions for channel capacity in terms of the probabilistic description 
of various channel models. 

In his landmark paper [2], Shannon showed the capacity formula 

C = maxJ(X;F) (1) 

for memoryless channels. The capacity formula ([U) is further extended to the well-known limiting 
expression 

C = lim sup-/(X";r") (2) 

for channels with memory. Dobrushin proved the capacity formula Q for the class of information 
stable channels in [3]. However, there are channels that do not satisfy the information stable 
condition and for which the capacity formula ^ fails to hold. Examples of information unstable 
channels include the stationary regular decomposable channels [4], the stationary nonanticipatory 
channels [5] and the averaged memoryless channels [6]. In [7] Verdu and Han derived the capacity 

C = s\ivI{X-Y) (3) 

X 

for general channels, where /(X; Y) is the liminf in probability of the normalized information 
density. The completely general formula ([3]) does not require any assumption such as memory- 
lessness, information stability, stationarity, causality, etc. 

The focus of this paper is on one class of such information unstable channels, the composite 
channel [8]. A composite channel is a collection of channels {Ws : s G 5} parameterized by s, 
where each component channel is stationary and ergodic. The channel realization is determined 
by the random variable S, which is chosen according to some channel state distribution p{s) at 
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the beginning of transmission and then held fixed. The composite channel model describes 
many communication systems of practical interest, for instance, applications with stringent 
delay constraint such that a codeword may not experience all possible channel states, systems 
with receiver complexity constraint such that decoding over long blocklength is prohibited, and 
slow fading wireless channels with channel coherence time longer than the codeword duration. 
Ahlswede studied this class of channels under the name averaged channel and obtained a formula 
for Shannon capacity in [6]. It is also referred to as the mixed channel in [9]. The class of 
composite channels can be generalized to channels for which the optimal input distribution 
induces a joint input-output distribution on which the ergodic decomposition theorem [10, 
Theorem 1.8.2] holds, e.g. stationary distributions defined on complete, separable metric spaces 
(Polish spaces). In this case the channel index s becomes the ergodic mode. 

Shannon's capacity definition, with a focus on stationary and ergodic channels, has enabled 
great insight and design inspiration. However, the definition is based on asymptotically large 
delay and imposes the constraint that all transmitted information be correctly decoded. In 
the case of composite channels the capacity is dominated by the performance of the "worst" 
component channel, no matter how small its probability. This highlights the pessimistic nature 
of the Shannon capacity definition, which forces the use of a single code with arbitrarily 
small error probability. In generalizing the channel model to deal with such scenarios as the 
composite channel above, we relax the constraints and generalize the capacity definitions. These 
new definitions are fundamental, and they address practical design strategies that give better 
performance than traditional capacity definitions. 

Throughout this paper we assume the channel state information is revealed to the receiver 
(CSIR), but no channel state information is available at the transmitter (CSIT). The downlink 
satellite communication system gives an example where the transmitter may not have access to 
CSIT: the terrestrial receivers implement channel estimation but do not have sufficient transmit 
power to feed back the channel knowledge to the satellite transmitter. In other cases, the 
transmitter may opt for simplified strategies which do not implement any adaptive transmission 
based on channel state, and therefore CSIT becomes irrelevant. 

The first alternative definition we consider is capacity versus outage [11]. In the absence of 
CSIT, the transmitter is forced to use a single code, but the decoder may decide whether the 
information can be reliably decoded based on CSIR. We therefore design a coding scheme that 
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works well most of the time, but with some maximal probability q, the decoder sees a bad 
channel and declares an outage; in this case, the transmitted information is lost. The encoding 
scheme is designed to maximize the capacity for non-outage states. Capacity versus outage was 
previously examined in [11] for single-antenna cellular systems, and later became a common 
criterion used in multiple-antenna wireless fading channels [12]-[14]. In this work we formalize 
the operational definition of capacity versus outage and also give the information-theoretical 
definition through the distribution of the normalized information density. 

Another method for dealing with channels of variable quality is to allow the receiver to 
decode partial transmitted information. This idea can be illustrated using the broadcast strategy 
suggested by Cover [15]. The transmitter views the composite channel as a broadcast channel 
with a collection of virtual receivers indexed by channel realization S. The encoder uses a 
broadcast code and encodes information as if it were broadcasting to the virtual receivers. The 
receiver chooses the appropriate decoder for the broadcast code based on the channel Ws in 
action. The goal is to identify the point in the broadcast rate region that maximizes the expected 
rate, where the expectation is taken with respect to the state distribution p{S) on S. Shamai 
et al. first derived the expected capacity for Gaussian slowly fading channels in [16] and later 
extended the result to MIMO fading channels in [17]. The formal definition of expected capacity 
was introduced in [8], where upper and lower bounds were also derived for the expected capacity 
of any composite channel. Details of the proofs together with a numerical example of a composite 
binary symmetric channel (BSC) appeared recently in [18]. Application of the broadcast strategy 
to minimize the end-to-end expected distortion is also considered in [19], [20]. 

The alternative capacity definitions are of particular interest for applications where it is 
desirable to maximize average received rate even if it means that part of the transmitted in- 
formation is lost and the encoder does not know the exact delivered rate. In this case the 
receiver either tolerates the information loss or has a mechanism to recover the lost information. 
Examples include scenarios with some acceptable outage probability, communication systems 
using multiresolution or multiple description source codes such that partial received information 
leads to a coarse but still useful source reconstruction at a larger distortion level, feedback 
channels where the receiver tells the transmitter which symbols to resend, or applications where 
lost source symbols are well approximated by surrounding samples. The received rate averaged 
over multiple transmissions is a meaningful metric when there are two time horizons involved: 
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a short time horizon at the end of which decoding has to be performed because of stringent 
delay constraint or decoder complexity constraint, and a long time horizon at the end of which 
the overall throughput is evaluated. For example, consider a wireless LAN service subscriber. 
Whenever the user requests a voice or data transmission over the network, he usually expects 
the information to be delivered within a couple of minutes, i.e. the short time horizon. However, 
the service charge is typically calculated on a monthly basis depending on the total or average 
throughput within the entire period, i.e. the long time horizon. 

It is worth pointing out that our capacity analysis does not apply to the compound channel 
[21]-[23]. A compound channel includes a collection of channels but does not assume any 
associated state distribution and therefore has no information density distribution, on which the 
capacity definition relies. Our channel model also excludes the arbitrarily varying channel [21], 
[24], where the channel state changes on each transmission in a manner that depends on the 
channel input in order to minimize the capacity of the chosen encoding and decoding strategies. 

The remainder of this paper is structured as follows. In Section |ll] we review how the 
information theoretical definitions of channel capacity evolved with channel models, and give a 
few definitions that serve as the basis for the development of generalized capacity definitions. 
The Shannon capacity is considered in Section Hill where we provide an alternative proof of 
achievability based on a modified notion of typical sets. We also show that the Shannon capacity 
only depends on the support set of the channel state distribution. In Section |W] we give a formal 
definition of the capacity versus outage and compare it with the closely-related concept of e- 
capacity [7]. In Section |V] we introduce the expected capacity and establish a bijection between 
the expected-rate code and the broadcast channel code. In Section |VI] we compare capacity 
definitions and their implications through two examples: the Gilbert-Elliott channel and the BSC 
with random crossover probabilities. The implication of these alternative capacity definitions for 
end-to-end distortion, source-channel coding and separation is briefly discussed in Section IVIII 
Conclusions are given in Section IVIIIi 

II. Background 

Shannon in [2] defined the channel capacity as the supremum of all achievable rates R for 
which there exists a sequence of (2"^, n) codes such that the probability of error tends to zero 
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as the blocklength n approaches infinity, and showed the capacity formula ^ 

C = max I{X;Y) 



X 



for memoryless channels. In proving the capacity formula ©, the converse of the coding theorem 
[1, p. 206] uses Fano's inequality and establishes the right-hand side of ([B as an upper bound 
of the rate of any sequence of channel codes with error probability approaching zero. The direct 
part of the coding theorem then shows any rate below the capacity is indeed achievable. Although 
the capacity formula ([U) is a single-letter expression, the direct channel coding theorem requires 
coding over long blocklength to achieve arbitrarily small error probability. The receiver decodes 
by joint typicality with the typical set defined as [1, pp. 195] 



— logp(x") 
n 



— \ogp{yn-H{Y) 
n 



< e, 



< e, 



-iogp(x",i/")-i/(x,r) 

n 



< e 



(4) 



which relies on the law of large numbers to obtain the asymptotic equipartition property (AEP). 
For channels with memory, the capacity formula (dJ generalizes to the limiting expression ^ 

C = lim sup-J(X";F"). 

However, the capacity formula ^ does not hold in full generality. Dobrushin proved it for the 
class of information stable channels. The class of information stable channels, including the class 
of memoryless channels as a special case, can be roughly described as having the property that 
the input maximizing the mutual information /(X"; F") and its corresponding output behave 
ergodically. In a sense, an ergodic sequence is the most general dependent sequence for which 
the strong law of large numbers holds [1, p. 474]. The coding theorem of information stable 
channels follows similarly from that of memoryless channels. 

However, the joint typicality decoding technique cannot be generalized to information unstable 
channels. For general channels, the set A^J^^ defined in does not have the AEP. As an evidence, 
the probability of A^"^ does not approach 1 for large n. We may not construct channel codes 
which has small error probability and meanwhile has a rate arbitrarily close to (O. Therefore, the 
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right-hand side of Q, although still a valid upper bound of channel capacity, is not necessarily 
tight. In [7] Verdu and Han presented a tight upper bound for general channels and showed its 
achievability through Feinstein's lemma [25]. We provide an alternative proof of achievability 
based on a new notion of typical sets in Section Unl 

This information stable condition can be illustrated using the concept of information density. 

Definition 1 (Information Density) Given a joint distribution Px^y" on Af^xJ^" with marginal 
distributions Px^ and Pyn, the information density is defined as [26] 



zjs:nyn(a;";?/") = log 



Pyn|Xn(y"|x") 

Pyn(?/") 

The distribution of the random variable {l/n)ix^Y-^{x^]y^') is referred to as the information 
spectrum of Px^y . It is observed that the normalized mutual information 

-I{X--Y-)= ^ p(x",y").llog^^^^ 

is the expectation of the normalized information density 

-z(a:";?/") = - log. ' 



n n p{y^) 

with respect to the underlying joint input-output distribution y"), i.e. 

-/(X"; F") = Ex»y» l-^x-y-fX"; F"^ 
n yn 

Denote by X" the input distribution that maximizes the mutual information /(X"; F") and 

by F" the corresponding output distribution. The information stable condition [27, Definition 3] 

requires that the normalized information density (l/n)i(X"; F"), as a random variable, converges 

in distribution to a constant equal to the normalized mutual information (l/n)/(X"; F") as the 

blocklength n approaches infinity. 

In [7] Verdu and Han derived the capacity formula ([3]) 

C = supI{X;Y) 

X 

for general channels, where /(X; Y) is the liminf in probability of the normalized information 
density. In contrast to information stable channels where the distribution of (l/n)i(X"; F") 
converges to a single point, for information unstable channels, even with infinite blocklength the 
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support se^ of the distribution of F") may still have multiple points or even contain 

an interval. The Shannon capacity equals the infimum of this support set. 

The information spectrum of an information stable channel is demonstrated in the upper plot 
of Fig. [U As the block length n increases, the convergence of the normalized information density 
to the channel capacity follows from the weak law of large numbers. In the lower plot of Fig. [H 
we show the empirical distribution of (l/n)2(X"; y") for an information unstable channel. The 
distribution of the normalized information density does not converge to a single point, so the 
equation ^ does not equal the capacity, which is given by Q. 





n = oo 
\n = 1000 

Y-,,n = 10 
V n = 1 







lim iJ(X"; F") 




IiX;Y) lim i/(X"; F"^ 



Fig. 1. Empirical distribution of normalized information density. Upper: information stable channel. Lower: information unstable 
channel. 



III. Shannon Capacity 

We consider a channel W which is statistically modeled as a sequence of ra-dimensional 
conditional distributions W = {W' = Pzn\x"}'^^i. For any integer n > 0, is the conditional 
distribution from the input space A"" to the output space Z". Let X and Z denote the input and 
output processes, respectively, for the given sequence of channels. Each process is specified by 
a sequence of finite-dimensional distributions, e.g. X = {X" = {x["\ ■ ■ ■ , X^'^)}^^-^^. 

'The smallest closed set of which the complement set has probability measure zero. 
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To consider the special case where the decoder has receiver side information not present at the 
encoder, we represent this side information as an additional output of the channel. Specifically, 
we let = {S, F"), where S is the channel side information and is the output of the channel 
described by parameter S. Throughout, we assume that 5 is a random variable independent of 
X and unknown to the encoder. Thus for each n 

and the information density © can be rewritten as 

tX"W"{x ^z"^) = log 



log 



iV"|s(z/1s) 

= zx"VK"(a;";z/"|s). (6) 

In the following we see that the generalized capacity definitions of composite channels depend 
crucially on information density instead of mutual information. We also denote by Fx{oi) the 
limit of the cumulative distribution function (cdf) of the normalized information density, i.e. 

Fx{a) = lim Px^w- < a] , (7) 

where the subscript emphasizes the input process X. 

Consider a sequence of (2"^, n) codes for channel W, where for any i? > 0, a (2"^, n) code 
is a collection of 2"^ blocklength-ra channel codewords and the associated decoding regions. The 
Shannon capacity is defined as the supremum of all rates R for which there exists a sequence 
of (2"^, n) codes with vanishing error probability [2]. Therefore, the Shannon capacity C{W) 
measures the rate that can be reliably transmitted from the encoder and also be reliably received 
at the decoder. We simplify this notation to C if the channel argument is clear from context. 

The achievability and converse theorems for the Shannon capacity of a general channel 

C = snpI{X;Z) = snp I {X;Y\S) 

X X 

= sup sup {a : -Fx(«) = 0} (8) 

X 

are proved, respectively, by Theorems 2 and 5 of [7], using Feinstein's lemma [25], [9, Lemma 
3.4.1], [28, Lemma 3.5.2] and the Verdu-Han lemma [7, Theorem 4]. The special case of 
a composite channel with CSIR follows immediately from this result. We here provide an 

April 26, 2008 DRAFT 



10 



alternative proof of achievability based on a modified notion of typical sets. In the following 
proof we simplify notations by removing the explicit conditioning on the side information S. 

Encoding: For any input distribution Pxn, e > 0, and R < I_{X; Y) — e, generate the codebook 
by choosing ■ ■ ■ , X"(2"^) i.i.d. according to the distribution Pxn{x^). 

Decoding: For any e > 0, the typical set Ai^^ is defined as 

A^J^^ = S^ix^y^ : >/(X;r)-e|. (9) 

Channel output is decoded to where i is the unique index for which G 

y4e"\ An error is declared if more than one or no such index exists. 

Error Analysis: We define the following events for all indices I <i,j < 2"^, 

Ej, = { Y^) e X"(z) sent} . (10) 

Conditioned on codeword X'^(i) being sent, the probability of the corresponding error event Ei 

E^ = [jE,,[jE^,, 

can be bounded by 

Pr(E,)<Pr(Er.) + J]Pr(i?,0- 

Since we generate i.i.d. codewords, Fr{Eii) and Pr(£'jj), j ^ i, do not depend on the specific 
indices i, j. Assuming equiprobable inputs, the expected probability of error with respect to the 
randomly generated codebook is: 

p(n) 

e 

= Pr {error|X''(l) sent} 

< Pr(i?i^) + ^Pr(i?,i) 
i=2 



< Pvn. 



X"W" 



< e„ + 2'^[^-^(^^^)+^l Yl Px-w-{x\y^), (11) 

(a:",j/")eA^"-' 
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where by definition of /(X; Y) we have e„ approaching for n large enough. The last inequality 
uses ©, dll), and the fact that (x", y") G A^"-* implies 

n^-"^"^" ; ^ ) = - log p^„(,.)p^„(^.) > ^) - ^ 



and consequently 
From ([n]) 



p^(") < e„ + 2"[-R-^(^'^)+^l ^ 

for all R < I_{X; Y) — e and arbitrary e > 0, which completes our proof. 

Although a composite channel is characterized by the collection of component channels {Wg : 
s G 5} and the associated probability distribution p(s) on S, the Shannon capacity of a composite 
channel is solely determined by the support set of the channel state distribution p{s). In the case 
of a discrete channel state set S, we only need to know which channel states have positive 
probability. The exact positive value that the probability mass function p{s) assigns to channel 
states is irrelevant in view of the Shannon capacity. In the case of a continuous channel state 
set S, we only need to know the subset of channel states where the probability density function 
is strictly positive. This is formalized in Lemma [Tl Before introducing the lemma we need the 
following definition [29, Appendix 8]. 

Definition 2 (Equivalent Probability Measure) A probability measure pi is absolutely contin- 
uous with respect to p2, written as pi <^ p2, if PiiA) = implies that p2iA) = for any event 
A. Here Pi{A), i = 1,2, is the probability of event A under probability measure pi. pi and p2 
are equivalent probability measures if pi ^ p2 and p2 Pi- 

Lemma 1 Consider two composite channels Wi and W2 with component channels from the 
same collection {Ws : s G S}. Denote by pi{s) and P2{s), respectively, the corresponding 
channel state distribution of each composite channel. Then pi <^ p2 implies C{Wi) < C{W2)- 
Furthermore, if pi and p2 are equivalent probability measures, then C{Wi) = C{W2)- 

Intuitively speaking, pi <^ p2 if the support set for W2 is a subset of the support set for Wi, so 
any input distribution that allows reliable transmission on Wi also allows reliable transmission 
on W2- Pi and p2 are equivalent probability measures if they share the same support set, and this 
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guarantees that the corresponding composite channels have the same Shannon capacity. Details 
of the proof are given in Appendix lAl 

The equivalent probability measure is a sufficient but not necessary condition for two compos- 
ite channels to have the same Shannon capacity. For example, consider two slow-fading Gaussian 
composite channels. It is possible that two probability measures have no support below the same 
channel gain, but one assigns non-zero probability to states with large capacity while the other 
does not. In this case, the probability measures are not equivalent; nevertheless the Shannon 
capacity of both composite channels are the same. 

IV. Capacity versus Outage 

The Shannon capacity definition imposes the constraint that all transmitted information be 
correctly decoded at the receiver with vanishing error probability, while in some real systems it is 
acceptable to lose a small portion of the transmitted information as long as there is a mechanism 
to cope with the packet loss. For example, in systems with a receiver complexity constraint, 
decoding over finite blocklength is necessary but in the case of packet loss, ARQ (automatic 
repeat request) protocols are implemented where the receiver requests retransmission of the lost 
information [30], [31]. If the system has a stringent delay constraint, lost information can be 
approximated from the context, for example the block-coded JPEG image transmission over noisy 
channels where missing blocks can be reconstructed in the frequency domain by interpolating 
the discrete cosine transformation (DCT) coefficients of available neighboring blocks [32]. These 
examples demonstrate a new notion of capacity versus outage: the transmitter sends information 
at a fixed rate, which is correctly received most of the time; with some maximal probability q, 
the decoder sees a bad channel and declares an outage, and the transmitted information is lost. 
This is formalized in the following definition: 

Definition 3 (Capacity versus Outage) Consider a composite channel W with CSIR. A (2"^, n) 
channel code for W consists of the following: 

• an encoding function X" : W = {1, 2, ■ ■ ■ , 2"^} — > A"", where U is the message index set 
and X is the input alphabet; 

• an outage identification function / : iS ^ {0, 1}, where S is the set of channel states; 

• a decoding function Qn : y"" x S ^ U = {1,2, ■ ■ ■ , 2"^}, which only operates when 1=1. 
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Define the outage probability 

pin) ^ p^|j ^ Q| 

and the error probability in non-outage states 

p[n) ^ p^lfj _^ _ 

A rate R is outage-g achievable if there exists a sequence of (2"^, n) channel codes such that 
lim Pj"'' < q and lim Pj:'^^ = 0. The capacity versus outage Cq of the channel W with CSIR 

n— >oo n— +00 

is defined to be the supremum over all outage-g achievable rates. 

in) 

In the above definition, Po is the probability that the decoder, using its side information 
about the channel, determines it cannot reliably decode the received channel output and declares 
an outage. In contrast, Pi"^ is the probability that the receiver decodes improperly given that an 
outage is not declared. Definition [3] can be viewed as an operational definition of the capacity 
versus outage. In parallel to the development of the Shannon capacity, we also give an information 
theoretic definition [1, p. 184] of the capacity versus outage 

Cq = snvIq{X-Y\S) 

X 

= sup sup {a : Px(a) < g} • (12) 

X 

Notice that Cq = C, so the capacity versus outage is a generalization of the Shannon capacity. 
The achievability proof follows the same typical-set argument given in Section Unl The converse 
result likewise follows [7]. Details are given in Appendix |Bl 

The concept of capacity versus outage was initially proposed in [11] for cellular mobile 
radios. See also [33, Ch. 4] and references therein for more details. A closely-related concept 
of e-capacity was defined in [7]. However, there is a subtle difference between the two: in the 
definition of e-capacity the non-zero error probability e accounts for decoding errors undetected 
at the receiver. In contrast, in the definition of capacity versus outage the receiver declares 
an outage when the channel state does not allow the receiver to decode with vanishing error 
probability. Asymptotically, the probability of error must be bounded by some fixed constant q 
and all errors must be recognized at the decoder. As a consequence, no decoding is performed 
for outage states. If the power consumption to perform receiver decoding becomes an issue, as 
in the case of sensor networks with non-rechargeable nodes or power-conserving mobile devices, 
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then we should distinguish between decoding with error and no decoding at all in view of energy 
conservation. 

This subtle difference also has important consequences when we consider end-to-end commu- 
nication performance using source and channel coding. When the outage states are recognized by 
the receiver, it can request a retransmission or simply reconstruct the source symbol by its mean 
- giving an expected distortion equal to the source variance. In contrast, if the receiver cannot 
recognize the decoding error as in the case of an e-capacity channel code, the reconstruction 
based on the incorrectly decoded symbol may lead to not only large distortion but also loss of 
synchronization in the source code's decoder. 

We can further define the outage capacity C° = {1 — q)Cq as the long-term average rate, 
if the channel is used repeatedly and at each use the channel state is drawn independently 
according to p{s). The transmitter uses a single codebook and sends information at rate C^; the 
receiver can correctly decode the information a proportion (1 — g) of the time and turns itself 
off a proportion q of the time. The outage capacity is a meaningful metric if we are only 
interested in the fraction of correctly received packets and approximate the unreliable packets 
by surrounding samples. In this case, optimizing over the outage probability q to maximize C° 
guarantees performance that is at least as good as the Shannon capacity and may be far better. As 
another example, if all information must be correctly decoded eventually, the packets that suffer 
an outage have to be retransmitted. This demands some repetition mechanism that is usually 
implemented in the link-layer error control of data communication. The number of channel uses 
K to transmit a packet of size (A^ = Cg) bits has a geometric distribution 

Ft{K = k} = -g), 

and the expected value is -(jz^ = which also illustrates C° as a measure of the long-term 
average throughput. 

Next we briefly analyze the capacity versus outage from a computational perspective. We need 
the following definition before we proceed: 

Definition 4 (Probability-g Compatible Subchannel) Consider a composite channel W with 
state distribution p{s), s E S. Consider another channel Wg where the channel state set Sg is a 
subset of S (Sg C S). Wg is a probability-q compatible subchannel of W if Pr{iSq} > 1 — g. 
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Note that Wg is not exactly a composite channel since we only specify the state set Sg but not 
the corresponding state distribution over Sg. However, we will only be interested in the Shannon 
capacity of Wg, and as pointed out by Lemma [B the exact distribution over Sg is irrelevant to 
determine this capacity. 

The capacity versus outage as defined in (fT2l) requires a two-stage optimization. In the first 
step we fix the input distribution X and find the probability-g compatible subchannel that yields 
the highest achievable rate. In the second step we optimize over the distribution of X. This view 
is more convenient if the optimal input distribution can be easily determined. We then evaluate 
the achievable rate of each component channel with this optimal input and declare outage for 
those with the lowest rates. As an example, consider a slow-fading MIMO channel with m 
transmit antennas. Assume the channel matrix H has i.i.d. Rayleigh fading coefficients. The 
outage probability associated with transmit rate R is known to be [34] 



Po(R) = inf Pr 

QbO,Tr{Q)<m 



SNR 

log det ( / + HQW ] < R 

m 



and the capacity versus outage is Cg = sup{R : Po{R) < q}- Although the optimal input 
covariance matrix Q is unknown in general, it is shown in [14] that there is no loss of generality 
in assuming Q = I in the high SNR regime and the corresponding capacity versus outage 
simplifies to 

Cg = sup |i? : Pr log det (^I + ^^HH^^ <R < g| . 

By reversing the order of the two optimization steps we have another interpretation of capacity 
versus outage 

Cg = snpC{Wg). (13) 
w. 

Here we first determine the Shannon capacity of each probability-g compatible subchannel, 
then optimize by choosing the one with the highest Shannon capacity. This view highlights the 
connection between Cg of a composite channel and the Shannon capacity of its probability- 
q compatible subchannels, and is more convenient if there is an intrinsic "ordering" of the 
component channels. For example consider a degraded collection of channels where for any 
channel states si and S2 there exists a transition probability ^(^2 bi) ^^ch that 
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The degraded relationship can be extended to the less noisy and more capable conditions [35]. 
The more capable condition requirej^ 

>/(X"; 171^2) (14) 

for any input distribution X. It is the weakest of all three but suffices to establish an ordering. 
The optimal probability-g compatible subchannel VF* has the smallest set of channel states S* 
such that any component channel within S* is more capable than a component channel not in 
S*. The Shannon capacity of W*^ equals the capacity versus outage-g of the original channel 
W. 

V. Expected Capacity 

The definition of capacity versus outage in Section |IV] is essentially an all-or-nothing game: 
the receiver may declare outage for undesirable channel states but is otherwise required to 
decode all transmitted information. There are examples where partial received information is 
useful. Consider sending a multi-resolution source code over a composite channel. Decoding 
all transmitted information leads to reconstruction with the lowest distortion. However, in the 
case of inferior channel quality, it still helps to decode partial information and get a coarse 
reconstruction. Although the transmitter sends information at a fixed rate, the notion of expected 
capacity allows the receiver to decide in expectation how much information can be correctly 
decoded based on channel realizations. 

Next we introduce some notation which is useful for the formal definition of the expected 
capacity. Conventionally we represent information as a message index, c.f. the Shannon capacity 
definition [1, p. 193] and the capacity versus outage definition in Section|IVl To deal with partial 
information, here we represent information as a block of bits where X is the set of bit 

indices. Denote by 

Mil) = {{bi)i^j:h binary} 

the set of all possible blocks of information bits with bit indices from the set X. Each element 
in M.{X) is a bit- vector of length |X|, so the size of the set M.{X) is 21-^1 If another index set 
X is a proper subset of X (X C X), then M.{X) represents some partial information with respect 

^Assuming each component channel is stationary and ergodic, the mutual information in l ll4t is well defined. 



April 26, 2008 



DRAFT 



17 



to the full information This representation generalizes the conventional representation 

using message indices. 

Definition 5 (Expected Capacity) Consider a composite channel W with channel state distri- 
bution p{s). A (2"^*, {2"^''},n) code consists of the following: 

• an encoding function 

/„:A^(X„,) = {(6,W„,J-A'", 

where X„ ^ = {1, 2, ■ ■ ■ , nRt] is the index set of the transmitted information bits and X is 
the input alphabet; 

• a collection of decoders, one for each channel state s, 

where X„ ^ C is the set of indices of the decodable information bits in channel state s. 
Define the decoding error probability associated with channel state s as 

and the average error probability 

Pi") = EsPi"'^) = j Pj:''^'^p{s)ds. 

A rate R = EsRs is achievable in expectation if there exists a sequence of (2"^% {2"^=}, n) codes 
with average error probability lim P^^^ = 0. The expected capacity C^lW) is the supremum 

n— »oo 

of all rates R achievable in expectation. 

We want to emphasize a few subtle points in the above definition. In channel state s the receiver 
only decodes those information bits with indices i E In,s- Decoding error occurs if any 
of the decoded information bits is different from the transmitted information bit (bi). No 
attempt is made to decode information bits with indices out of the index set X„ hence these 
information bits are irrelevant to the error analysis for channel state s. 

The cardinality nRg of the index set X,i depends only on the blocklength n and the channel 
state s. Among the transmitted nRt information bits, the transmitter and the receiver can agree 
on the set of decodable information bits for each channel state before transmission starts, i.e. not 
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only the cardinality of X„ but the set X„ ,5 itself is uniquely determined by the channel state s. 
Nevertheless, for the same channel state s, the receiver may choose to decode different sets of 
information bits depending on the actual channel output y"-, although all these sets are of the 
same cardinality nRg. In this case the set of decodable information bits for each channel state 
is unknown to the transmitter beforehand. 

We first look at the case where the transmitter and the receiver agree on the set of decodable 
information bits for each channel state. In a composite channel the transmitter can view the 
channel as a broadcast channel with a collection of virtual receivers indexed by channel real- 
ization S. The encoder uses a broadcast code to transmit to the virtual receivers. The receiver 
uses the side information S to choose the appropriate decoder. Before we proceed to establish 
a connection between the expected capacity of a composite channel and the capacity region of 
a broadcast channel, we state the following definition of the broadcast capacity region, which is 
a direct extension from the two-user case [1, p. 421] to the multi-user case. 

Consider a broadcast channel with m receivers. The receivers are indexed by the set S with 
cardinality m, which is reminiscent of the index set of channel states in a composite channel. 
The power set V{S) (or simply V) is the set of all subsets of S. The cardinality of the power 
set is \V{S)\ = 2™. 

Definition 6 (Broadcast Channel Capacity Region) A {{2^^p},n) code for a broadcast chan- 
nel consists of the following: 

• an encoder 

fn-- n -^P-'^"' 

where (p is the empty set, p E V{S) is a non-empty subset of users, and M.p = {1, 2, ■ ■ ■ , 2"^^} 
is the message set intended for users within the subset p only. The short-hand notation 
JlpTVlp denotes the Cartesian product of the corresponding message sets; 

• a collection of m decoders, one for each user s, 

P&V, s£p 

where 3^" is the channel output for user s. 
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Define the error event Es for each user as 

and the overall probability of error as 

Pi") = Pr||jE,|. 

A rate vector {Rp}p(zv is broadcast achievable if there exists a sequence of {{2"^p},n) codes 
with lim Pj") = 0. The broadcast channel capacity region Cbc is the convex closure of all 

n— »oo 

broadcast achievable rate vectors. 

In the above definition, we explicitly distinguish between private and common information. The 
message set A4p contains information decodable by all users s E p but no others. For instance, 
in a three-user BC we have private information Aii, M.2, A^3, information for any pair of users 
M.12, A^23> -^135 and the common information M.i2z- The total number of message sets is 
2"* — 1 since the empty set is excluded. 

We establish a connection between the expected capacity of a composite channel and the 
capacity region of a broadcast channel through the following theorem. For ease of notation we 
state the theorem for a finite number of users (channel states). The result can be generalized to 
an infinite number of users (continuous channel state alphabets) using the standard technique 
of [36, Ch. 7], i.e. to first discretize the continuous channel state distribution and then take the 
limiting case. 

Theorem 1 Consider a composite channel characterized by the joint distribution 

PW..(S,|/"|X") = P5(s)Pyn|;,„,s(y"|x^s), 

and the corresponding BC with the channel for each receiver satisfying 

Denote by the expected capacity of the composite channel and by Cbc the capacity region of 
the corresponding BC, as in Definitions [5] and [6l respectively. If the set of decodable information 
bits in the composite channel is uniquely determined by the channel state S, then the expected 
capacity satisfies 

C^= sup J]PpJ]P5(s)= sup Y.Ps{s)Y,Rp- (16) 
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The proof establishes a two-way mapping: any ({2"^p},n) code for the broadcast channel can 
be mapped to a (2"^% {2"'^''},n) expected-rate code for the composite channel and vice versa, 
where the mapping satisfies Rg = XIsgp channel state s. The details are given in Appendix 

o 

Although we have introduced a new notion of capacity, the connection established in Theorem 
[H shows that the tools developed for broadcast codes can be applied to derive corresponding 
expected capacity results, with the addition of an optimization to choose the point on the BC 
rate region boundary that maximizes the expected rate. For example, in [17] some suboptimal 
approaches, including super-majorization and one-dimensional approximation, were introduced 
to analyze the expected capacity of a single-user slowly fading MIMO channel. After the full 
characterization of the MIMO BC capacity region through the work [37]-[41], the expected 
capacity of a slowly fading MIMO channel can be obtained by choosing the optimal operating 
point on the boundary of the dirty-paper coding (DPC) region. 

The connection in Theorem [T] also shows that any expected-rate code designed for a composite 
channel can be put into the framework of BC code design. Strategies like layered source coding 
with progressive transmission, proposed in [42], immediately generalize to the broadcast coding 
problem. Assuming there are only two channel states si and this strategy divides the entire 
transmission block into two segments. The information transmitted in the first segment is intended 
for both states, and that in the second segment is intended for the better channel state S2 only. 
This strategy can be easily mapped to a BC code with individual information Ai2 and common 
information A4i2, and orthogonal channel access. Furthermore, the complexity of deriving a 
single point on the BC region boundary is similar to that of deriving the expected capacity 
under a specific channel state distribution. The entire BC region boundary can be traced out by 
varying the channel state distributions. 

We want to emphasize that in Theorem \T\ the condition that the transmitter knows the set 
of decodable information bits in advance is not superfluous. If the receiver chooses to decode 
different sets of information bits depending on the actual channel output 3^", and consequently 
the transmitter does not know the set of decodable information bits for each state s, then the 
mapping between expected-rate codes and BC codes may not exist. In the following we give 
an example where the expected capacity exceeds the supremum of expected rates achievable by 
BC codes. Consider a binary erasure channel (BEC) where the erasure probability takes two 
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equiprobable values < ai < a2 < 1. In Appendix ID] we show that the maximum expected 
rate achievable by BC codes is 

f 1-ai] 
R = max < 1 — ^2, — - — (■ ■ (17) 



2 

However, we can transmit uncoded information bits directly over this composite BEC. In the 
limit of large blocklength n, the receiver can successfully decode n(l — aj) bits for channel 
states ai, i = 1,2, by simply inspecting the channel output, although these successfully decoded 
information bits cannot be determined at the transmitter a priori. Overall the expected capacity 

exceeds the maximum expected rate achievable by BC codes. Notice, however, these two channel 
codes are extremely different from an end-to-end coding perspective. The broadcast strategy may 
be combined with a multiresolution source code. In contrast, the source coding strategy required 
for the uncoded case is a multiple description source code with single-bit descriptions. Due 
to this difference, it is not obvious which scenario yields the lower end-to-end distortion. The 
comparison depends on the channel state distribution and the rate-distortion function of the 
source. 

Regardless of the transmitter's knowledge about decodable information bits, we show that 
satisfies the lower bound > sup^ C° and the upper bound 



< suplimsupE5'Ex"y"|s' 

X n—*oo 



n 



S 



(18) 



The lower bound is achieved using the channel code for capacity versus outage-g, which achieves 
a rate Cg a proportion (l — q) of the time and zero otherwise. For the upper bound, we assume 
channel side information is provided to the transmitter (CSIT) so it can adapt the transmission 
rate to the channel state. In this case, the achievable expected rate can only be improved. The 
proof is given in Appendix |El 

VI. Examples 

In this section we consider some examples to illustrate various capacity definitions. 
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A. Gilbert-Elliott Channel 

The Gilbert-Elliott channel [43] is a two-state Markov chain, where each state is a BSC as 
shown in Fig. [2l The crossover probabilities for the "good" and "bad" BSCs satisfy < pc < 
Pb < 1/2- The transition probabilities between the states are g and b respectively. The initial 
state distribution is given by ttg and tvb for states G and B. We let Xn G {0, 1}, y„ G {0, 1}, 
and Zn = Xn® Vn denote the channel input, output, and error on the nth transmission. We then 
study capacity definitions when the channel characteristics of stationarity and ergodicity change 
with the parameters. 




Fig. 2. Gilbert-Elliott Channel 

Example 1: Ergodic Case, Stationary or Non- Stationary 

When ttg = 9 / (g + b) and ttb = b/{g + b), the Gilbert-Elliott channel is stationary and ergodic. 
In this case the information density Hx»w"i^^]Y") converges to a 5-function at the average 
mutual information, so capacity equals average mutual information as usual. Therefore the 
Shannon capacity C is equal to the expected capacity ttgCg + t^bCb, where Cg = 1 — h{pG), 
Cb = 1 — h{pB) and h{p) = —plogp — {1 — p) log(l — p) is the binary entropy function. 

This is a single-state composite channel. Since any transmission may experience either a good 
or a bad channel condition, the receiver has no basis for choosing to declare an outage on certain 
transmissions and not on others. Capacity versus outage equals Shannon capacity in this case. 

If ttg ^ g / {g + b) but b and g are nonzero, then the Gilbert-Elliott channel is ergodic but not 
stationary. However, the distribution on the states G and B converges to a stationary distribution. 
Thus the channel is asymptotically mean stationary, and the definitions of capacity have the same 
values as in the stationary case. 

Example 2: Stationary and Nonergodic Case 
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We now set g = b = 0. So the initial channel state is chosen according to probabilities {ttg, hb} 
and then remains fixed for all time. The Shannon capacity equals that of the bad channel (C = 
Cb)- The capacity versus outage-g = if the outage probability q < hb and Cq = Cq 
otherwise. The loss incurred from lack of side information at the encoder is that the expected 
capacity is strictly less than the average of individual capacities tibCb + t^gCg and is equal to 
[15] 

max 1 - /i(r + vrG[/i(r - /^(pg)], (19) 

0<r<l/2 

where a* (5 = a(l — /5) + (1 — ol)[3. The interpretation here is that the broadcast code achieves 
rate 1 — h{r * pb) for the bad channel and an additional rate h{r * pc) — h{pc) for the good 
channel, so the average rate is the expected capacity. 

Using the Lagrangian multiplier method we can obtain r* which maximizes (fT9l) . Namely if 
we define 

, TTG . 1 - 2|}ij log(lM - 1) 

k = — , A = - — , f{pi,P2)- 



ttb' i-'^Pc' ' log(lM-l) 

then r* = if A; < Af{pB,PG)', r* = 1/2 if k > and r* solves /(r * pG,r * pb) = A/k 
otherwise. 



B. BSC with random crossover probabilities 

In the non-ergodic case, the Gilbert-Elliott Channel is a two-state channel, where each state 
corresponds to a BSC with a different crossover probability. We now generalize that example to 
allow more than two states. We consider a BSC with random crossover probability < p < 1/2. 
At the beginning of time, p is chosen according to some distribution f{p) and then held fixed. 
We also use F{p) = Jq f{s)ds to denote the cumulative distribution function. Like the non- 
ergodic Gilbert-Elliott channel, this is a multi-state composite channel provided {p : f{p) > 0} 
has cardinality at least two. The Shannon capacity is C = 1 — h{p*) where 

p* = sup{p : f{p) > 0} = inf{p : F{p) = 1}, 

and the capacity versus outage-g is Cg = 1 — h{pq) where Pg = inf{p : F{p) > I — q}. 

We consider a broadcast approach on this channel to achieve the expected capacity. The 
receiver is equivalent to a continuum of ordered users, each indexed by the BSC crossover 
probability p and occurring with probability f{p)dp. If the set {p : /(p) > 0} is infinite, then 
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the transmitter sends an infinite number of layers of coded information and each user decodes 
an incremental rate \dR{p)\ corresponding to its own layer. Since the BSC broadcast channel is 
degraded, a user with crossover probability p can also decode layers indexed by larger crossover 
probabilities, therefore we achieve a rate of 

R{p) = - I ' dR{p) (20) 

for receiver p. The problem of determining the expected capacity then boils down to the 
characterization of the broadcast rate region and the choice of the point on that region that 
maximizes J^R(p)f(p)dp. 

In the discrete case with users, assuming < pi < ■ ■ ■ < Pn 1^ (V2)> the capacity region 
is shown to be [44] 

{R = {Ri)i<i<N ■■ Ri = R{Pi) = h{ri * Pi) - /i(ri_i * pi)} (21) 

where = ro<ri<---<rjv = l/2. Since the original broadcast channel is stochastically 
degraded it has the same capacity region as a cascade of N BSC's. The capacity region boundary 
is traced out by augmenting (A^ — 1) auxiliary channels [44] and varying the crossover proba- 
bilities of each. For each i, equals the overall crossover probability for auxiliary channels 1 
up to i. See Fig. [3] for an illustration. The resulting expected capacity is 

N N 

= max V f{pi) Y][/i(ri * pi) - h{ri_i * pi)]. 

0=ro<-<riv=l/2^ 

4 = 1 J=l 



Auxiliary cliannel Degraded BSC BC 

X Yl Y2 YN 



>\ • • • \/ \/ 


y\ y\ • • • 






^ Pi ^ 






' P2 


r ' N-i 





Fig. 3. BSC broadcast channel with auxiliary channels for random coding 

We extend the above result to the continuous case with an infinite number of auxiliary channels. 
In this case we define a monotonically increasing function r{p) equal to the overall crossover 
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probability of auxiliary channels up to that indexed by p. In the following we use r(p) and 
interchangeably. For the layer indexed by p, the incremental rate is 

-dR{p) = h{p * Tp) - h{p * rp_dp). 

Using the first order approximation Vp^^p ^ — r'^dp and h{x — 5) ^ h{x) — h'{x)5 for small 
6, we obtain 

-dR{p) = h{p * Tp) - h{p * Tp^dp) 

fa h{p * Tp) — h{p * Tp — {1 — 2p)r'pdp) 
^ log - l) (1 - 2p)r;dp, 

Note here 5 = {l — 2p)r'pdp is a small variation, and we do not explicitly address the problematic 
limiting case h'{x) — oo as a: approaches zercj^. 
Overall the expected rate is 

= f{p)R{p)dp = - F{p)dR{p) 

Jo Jo 

= f ' F{p) log - 1 Vl - 2pypdp. (22) 

Jo \P*rp J 

The optimal r{p) maximizing the expected rate can be solved through calculus of functional 

variation. Define S{p, Vp, r'p) as 

S{p, rp, r;) = F{p) log - (1 - 2p)r;. (23) 

The optimal r{p) should satisfy the Eiiler equation [45] 

S,--^Sr' = Q (24) 
dp 

^The achievable rate R{p) for any state is bounded by one, therefore f^^'^ f(p)R(p)dp, as a function of e, is right continuous 
at e = 0. We can avoid the problematic limiting case by focusing on strictly positive e and obtain the expected capacity i22\ 
by continuity. 
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where 



OS 
or 



2pfF{p)r' 



p*rp- {p* TpY 



Sr' = 
dSr' 

dp 



— = (1 - 2p)F{p) log 



p*rp 



: [(l-2p)/(p)-2F(p)]log 
;i - 2p)F{p) 



1 — p* Vp 



p*rp- {p*rpY 
After some algebra (l24l) simplifies to 



[1 - 2r, 



1 



p*rp 
2p)r;] 



{p * rp) 



2p)/(p) - 2F{p) 



(25) 



log(l -p*rp) -log(p*rp) 
In general (l25l) has no closed-form solution but there exist obvious numerical approaches. 

As an example, suppose that the crossover probability is uniformly distributed on [0, 1/2]. The 
Shannon capacity is limited by the worst channel state {p = 1/2), giving C = 0. The capacity 
versus outage-g is Cg = [l — h{^^)~\ . To approximate the expected capacity, we solve for r{p) 
in (l25l) for each p. It is seen that < rp < 1/2 only for pi < p < Pu, where the two cutoff 
probabilities satisfy r(pi) = and r{pu) = 1/2. For the uniform distribution case, pi = 0.136 and 
Pu = 1/6, which demonstrates that it is unnecessary to use the channel all the time to achieve 
the expected capacity. In fact no information is sent for p > 1/6. 
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Fig. 4. Capacity under different definitions of BSC with random crossover probability. 
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Fig. 6. Effect of cutoff range 



In Fig. m we plot the expected capacity, the outage-g capacity, and the capacity versus outage- 
q. Although the capacity versus outage-g exceeds the expected capacity for some values of 
q, the outage-g capacity C° is always dominated by the expected capacity C^, since an outage-g 
code is one of many possible codes for the expected capacity. Define cutoff outage probabilities 
qi = 1 — 2pi and g^ = 1 — 2pu. Note that C° ^ for all g G [g/, qu]- In this range an outage 
code gives almost the same expected rate as a broadcast code. 

In Fig. \5\ we plot the rate used in each state by the expected capacity code and the capacity 
versus outage codes at outage probabilities qi, qu and 1/2. We see that the code for outage 
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capacity achieves a constant rate for non-outage states and a rate otherwise. For this example, 
the incremental rates \dR{p)\ are nonzero only for pi < p < Pu- Therefore the code for expected 
capacity achieves a rate when p > pu- As p decreases from p^ to pi, the rate gradually increases 
from to 0.38 bits per channel use, and stays at this constant level for p < pi. Since all channels 
are equally probable, the area under each curve is the expected rate of that strategy. The area 
under the expected capacity curve is the largest. The expected capacity curve is, in some places, 
lower than the curve for outage-g^ capacity. Although the outage-g^ code achieves a rate higher 
than the broadcast code for expected capacity when p < pi, the same code has decoding rate 
for all other channel states p > pi, giving a lower area under the total curve. 

A potential advantage of the outage code is its simplicity. The transmission rate is fixed, so the 
code may be coupled with a conventional source code. The advantage of the expected capacity 
code is its higher expected rate. The code may be coupled with a multiresolution source code. 
It is not obvious which strategy yields better end-to-end coding performance in this example. In 
general, an expected rate code is required to achieve the optimal end-to-end distortion, but this 
code may use a rate vector on the boundary of the BC capacity region which is different from 
the rate vector used by the code that achieves the expected capacity [20]. 

The procedure to solve for the expected capacity is computationally intensive. In the above 
example, when looking for the optimal r(p) which leads to the expected capacity, we first identify 
the cutoff probabilities {pi,Pu) and then solve (l25l) for eachp in this range. We want to emphasize 
that the correct cutoff range, although seemingly a very coarse characterization of the optimal 
solution, is crucial to the expected rate. Consider some alternative approaches: 

• Optimal cutoff [pi,Pu] with suboptimal r{p): 

{{p—PiY' ^ ^ 

0, otherwise. 

. Cutoff range [0, 1/2]: 

r{p) = (l/2)(2p)^. (27) 

The choice of 7 makes r(p) convex (7 > 1), linear (7 = 1) or concave (7 < 1) in both 
approaches. In Fig. [6l for 7 ranges between and 4 we plot the achievable expected rate using 
the cutoff range [0, 1/2] and suboptimal r(p) as in (ITTI) . the achievable expected rate using the 
optimal cutoff range [phPu] and suboptimal r(p) as in (|26|) . and the expected capacity of this 
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channel. We observe that the optimal cutoff range yields an expected rate very close to C^, 
but the expected rate is clearly suboptimal if we use the cutoff range [0, 1/2]. By optimizing 
the cutoff range we actually capture most benefit of the expected-rate code as compared to the 
conventional code for Shannon capacity. 

VII. Source-Channel Coding and Separation 

Channel capacity theorems deal with data transmission in a communication system. When 
extending the system to include the source of the data, we also need to consider the data 
compression problem. For the overall system, the end-to-end distortion is a well-accepted per- 
formance metric. When both the source and channel are stationary and ergodic, codes are usually 
designed to achieve the same end-to-end distortion level for any source sequence and channel 
realization. However, if the channel model is generalized to such scenarios as the composite 
channel above, it is natural to introduce generalized end-to-end distortion metrics such as the 
distortion versus outage and the expected distortion [46], similar to the development of alternative 
capacity definitions. These alternative distortion metrics are also considered in prior works [19], 
[20], [47]-[50]. 

The renowned source-channel separation theorem [21, Theorem 2.4] asserts that a target 
distortion level D is achievable if and only if the channel capacity C exceeds the source rate 
distortion function R{D), and a two-stage separate source-channel code suffices to meet the 
requiremenj^ This theorem enables separate design of source and channel codes and guarantees 
the optimal performance. However, there are a few underlying assumptions: a single-user channel; 
a stationary ergodic source and channel; a single distortion level maintained for all transmission. 
It is known that the separation theorem fails if the first two assumptions do not hold [27], [51]. In 
fact, the end-to-end distortion metrics also dictate whether the source-channel separation holds 
for a communication system. In [46] we showed the direct part of source-channel separation 
under the distortion versus outage metric and established the converse for certain systems. On 
the contrary, source-channel separation does not hold under the expected distortion metric. 

Source-channel separation implies that the operation of source and channel coding does not 
depend on the statistics of the counterpart. Meanwhile, the source and channel do need to 

''The separation theorem for lossless transmission [2] can be regarded as a special case of zero distortion. 
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communicate with each other through an interface, which is a single number in the classical 
separation theorem. For generalized source/channel models and distortion metrics, the interface 
is not necessarily a single rate and may allow multiple parameters to be agreed on between 
the source and channel encoders and decoders. As we expect a performance enhancement 
when source and channel exchange more information through more sophisticated interface, an 
interesting topic for future research would be to characterize the tradeoff between interface 
complexity and the achievable end-to-end performance [52]. 

VIII. Conclusions 

In view of the pessimistic nature of Shannon capacity for composite channels with CSIR, we 
propose alternative capacity definitions including capacity versus and expected capacity. These 
definitions lend insight to applications where side information at the receiver combined with 
appropriate source coding strategies can exploit these more flexible notions of capacity. We prove 
capacity theorems or bounds under each definition, and illustrate how expected achievable rates 
can be improved through examples of Gilbert-Elliot channels and a BSC with random crossover 
probabilities. While the use of capacity definitions inherently focuses our attention on achievable 
(expected) rates, we note that the existence of other meaningful measures of performance in the 
given coding environment. For example, since outage-g codes are compatible with conventional 
source codes while expected capacity codes require multiresolution or multiple description codes, 
depending on whether or not the corresponding broadcast channel is degraded, the fact that the 
expected rate of the expected capacity code exceeds that of the outage-g code does not guarantee 
lower end-to-end expected distortion. Furthermore, since a non-ergodic channel experiences a 
single ergodic mode for all time, there is some justification for performance measures that take 
the probability of suffering a very low-rate state into account. These topics provide a wealth of 
interesting questions for future research with some initial work presented in [19], [20], [46]. 

Appendix A 
Proof of Lemma H] 

We prove C{Wi) < C{W2) if Pi <^ P2, and vice versa. Therefore equivalent probability 
measures of pi and p2 imply identical Shannon capacity. The result is intuitive but we need to 
address a subtle technical issue: note that pi and p2 are channel state distributions, while the 
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Shannon capacity is defined through the information density distribution (|7]), which depends on 
both input and channel statistics. 

Recall the Shannon capacity formula ^ 

C{Wi) = supsup{a : Fx{a) = 0}. 

X 

Denote by X* the input distribution that achieves the supremum in ([8]), and by Fi(a) the 
corresponding information density distribution. For arbitrary e > 0, we define 

'1 



Notice that 



Me (a) = s s : lim Px^^y^is 



n 



> e 



> e 



lim Px^w- {-ix^w-{X'']Y''\S) < a\ 

lim / Px?y"|5 < a \ ■ pi{s)ds 

n^oo J ) 

I lim Pxny^is \-ix^Y^\s{X'';Y''\s) < a \ ■ pi{s)ds 

J n^oo l^n ) 

Pi{s)ds, 



(28) 



where we exchange the order of integral and limit according to dominant convergence theorem. 
From (|28l) we see that Fi{a) = implies 



/ pi{s)ds = 0. 



Assuming pi <^ p2, it follows that 



P2{s)ds = 0. 



Now define -^2(0;) as the information density distribution of channel W2 when evaluated at input 
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X^, i.e. 



Foia) 



= lim Px^w," (-^x:w,"(^"; Y"\S) < a] 

= I lim Px:^Y"\s I -ixi^Y^isi^"^; < a I ■ P2{s)ds 

Js-M,(a)''^°° [n J 

+ / lim I ^'^k) < a I ■ P2(s)rfs 

< e / p2(s)(is+ / P2{s)ds 

Js-MJa) J MAa) 



< e. 

Since e is arbitrary, we see that Fi(a) = implies ^2(0) = 0, therefore 

= sup{a : = 0} 

< sup{a : F2{a) = 0} 

< C{W2). 

Appendix B 

Proof of Capacity versus Outage Theorem (fT2l) 

We first prove the achievability of the capacity versus outage theorem (fT2l) . Consider a fixed 
outage probability g > 0. 

Encoding: For any input distribution Px^, e > 0, and R < I^{X;Y) — e, generate the 
codebook by choosing X"(l), ■ ■ - , X"(2"^) i.i.d. according to the distribution Px^{x^). 

Decoding: Define, for e > 0, the typical set A^J"^ as 

For any channel output F", we decode as follows: 

1) If (X'^(z), F") ^ A^"^ for alH G {1, ■ ■ ■ , 2"^}, declare an outage; 

2) Otherwise, decode to the unique index i e {!,■■■ ,2"-^} such that (X"(z),F") e 
An error is declared if more than one such index exists. 

Outage and Error Analysis: We recall the definition of events Eji in (fTOl) as 

E^, = { Y^) e X'^(z) sent} . 
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Assuming equiprobable inputs, the expected probability of an outage using the above scheme is: 
p(n) ^ Pr{outage|X''(l) sent} 

= Pr{nSi^-i} 

< q + Gn, 

where by definition of /^(X; 1^) we have e„ approaching for n large enough. Likewise, when 
no outage is declared the expected probability of error is 

p^n) _ Py {enor\X^ (1) sent and no outage declared} 



2"R 
Pr <j U 



i=2 



< 2"«Pr{E2i} 

(a;",j/")gyl*"' 

< 2n[R~mX;Y)+e] J2 Pxr.wA^"',yl, (29) 

where the last inequality is obtained by noticing that (x",?/") G A^J^^ implies 



1 ■ r n n\ ^ i (s^", 1/") , . 

n'^"^" ; 1/ ) = - log p^„(,.)p^„(^.) > UX; Y) - e 

or equivalently 

From (|29l) we see that Pi"^ for all R < Lq{X; Y) — e and arbitrary e > 0, which completes 
our proof. 

Next we prove the converse of the capacity versus outage theorem (fT2l) . Consider any sequence 
of (n, 2"^) codes with error probability Pi"^ — > and outage probability lim P^^"-' < q. Let 
{X'^(l),--- ,X"(2"^)} represent the rath code in the sequence, and assume a uniform input 
distribution 

' 2-"^, V x" e {X"(l), ■ ■ ■ , X"(2"^)}, 



0, otherwise. 
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For each i e {1, ■ ■ ■ , 2"^}, let Di represent the decoding region associated with codeword X'^{i) 
and Bi equal an analogy of the typical set, defined as 

[ n 

y --log — 1^ <R-i 

where 7 > is arbitrary. Then we have 

n 

2nR 

i=l 
i=l 

< E E i'x.W"(X"(z),y") + Pi") + Pi") 

1=1 y"&Bir\Di 

< 2^^" + Pi") + Pi"), 
since the decoding regions Di cannot overlap. Thus 

Pi"^ > Px-iy- r") < P - 7| - P^ - 2"^", 

which goes to zero if and only if P — 7 < /^(X; Y) by definition of I^{X; Y). 

Appendix C 
Proof of Theorem [H 

A. Mapping Broadcast Code to Expected-rate Code 

We first show that any broadcast code can be mapped to an expected-rate code, so 

C'>Y.RpY.Ps{s) (30) 
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for any {Rp} e Cbc- 

Given a ({2"^p},?t,) BC code as defined in Definition [6l we represent each message Mp E 
Aip in a binary format consisting of nRp bits and concatenate these bits to form an overall 
representation of nRt bits, where 

Rt= Yl ^p- (31) 

These nRt information bits are indexed by the index set X„ ^ = {1, 2, ■ ■ ■ , nRt}. We denote by 
X„ p the set of indices of the nRp bits that correspond to the message set Aip in the BC code. 
Note that X„p may be empty for some p E V, for different p these index sets are mutually 
exclusive and 

^n,t = 1^ ^n,p- (32) 

The ({2"^''}, n) BC code can be mapped to the following expected-rate code with transmit rate 
Rt given by (|3TI) . For any Mt E Ai{In,t), the bits (6^) with i E Tn,p ^ ^n,t define a corresponding 
message Mp in the message set Aip of the BC code. The encoder for the expected rate code 
satisfies 



\peV,py^4> J 



KpGV,p^cf) 

where the superscript e and BC distinguishes the encoder of the expected-rate code and the 
broadcast code. For a state s in the composite channel, the receiver decodes those information 
bits with indices in the set 

-^n,s [J '-^n,pi (33) 
p:s&p 

and the decoding rate is Rg = J2p sep composite channel, the decoder output 

is obtained by concatenating the binary representations {hi)i&j„,p of each Mp, where s Ep and 

p:s£p 

is the decoder output of receiver s in the broadcast channel. The decoding error probability for 
the expected-rate code in channel state s is 

p{n,s) ^ Y>l{Es}, 
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where the error event Eg for the broadcast code is defined in (fT5l) . Notice that 

p{n,s) ^ p^l^j < l^^^j ^ pH 

so the expected error probability 

as n oo, according to the BC code definition. Therefore the rate 

R = EsRs = J2 Ps{s)Rs = Yl Ps{s) Yl 

s s p-.sdp 

is an achievable expected rate and (l30l) is proved. 

B. Mapping Expected-rate Code to Broadcast Code 
Next we show that for any fixed e > 0, 

C--e< sup J]P5(s). (34) 

{i?p}eCBc 

According to the definition of the expected capacity, there exists a sequence of {(2"^% {2"^^}, n)} 
codes such that 

¥.sRs -^R>C^-e (35) 

and KsPe'^'^'' 0. The transmitted information bits are indexed by X„t = {1,2, ■■■ ,nRt}. 
Since the transmitter and the receiver agree on the index set X„ ^ of those information bits that 
can be reliably decoded in each channel state s, the transmitter can define, for each subset p E V 
of channel states, the index set X„ p of those information bits decodable exclusively for channel 
states within p, i.e. 



T 



where 

is the complement index set of X„ Denote by nRp the cardinality of X„ p. We observe that X„ p 
are mutually exclusive, the relationship (|32|) and (|33l) still hold and the decoding rate satisfies 

Rs = Ylsep^p- 
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The {(2"^% {2"^"},?^)} expected-rate code can be mapped to the following BC code. Define 
the message set of the BC code as 

Mp = M{In,p) 

in the sense that each message Mp G A4p has the corresponding binary representation {bi)i^Xu,p- 
The encoder for the BC code satisfies 



where Mt = ^ is obtained by concatenating the binary representations of each Mp. When 

the composite channel is in state s, the decoder output is 

Since Tn,p ^ In,s for any p satisfying s G p, we define the decoder output for receiver s in the 
BC to be 

gZiy"") = n 

p:s£p 

where the binary representation (&j)igj„_p of each Mp can be obtained by the corresponding bits 
in Ms. 

The error event Es for receiver s of the BC is defined in (fT5l) with the error probability 

Pr{Ej = Pi'^'^), 

and the overall error probability 

Pi") = Pr{U,Pj < 5^Pr{Pj = J^Pi'^'^). 

s s 

By definition of the expected-rate capacity 

Assuming each channel state s occurs with strictly positive probability, i.e. min P(s) > 0, then 

s£S 

E^pi"'^) ^ implies 

p(n) < 0. 
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Therefore the code constructed above is a valid BC code, i.e. {Rp} G Cbc, and we conclude 

R = EsRs = J2Ps{s)Rs = J2Ps{s)J2Rp 

s s p:sGp 

< sup J^RpJ^Psis). (36) 

{fl^pjeCec pg-p s€p 

From (l35l) and (l36l) we see the inequality (l34l) is established. Since e is arbitrary, Theorem [T] is 
a result of dBO]) and (l34l) . 

Appendix D 
Proof of ([Tt]) 

Consider a two-user BC where the channel to each user is a BEC with erasure probability 
i = 1,2, i.e. the conditional marginal distribution satisfies 

1 Oj, l/i X, 

tti, 2/i = e. 

Assuming ai < 0^2, we observe that the BC is stochastically degraded since 

p{y2\x) = ^p{yi\x)p'{y2\yi), 

yi 

where p'{e\e) = 1 and for yi ^ e 

1-^2 

p'iy2\yi 



p{.yi\x) 



2/2 = yi, 



1 — ai 

2/2 = e. 



1 — «! 

Therefore the capacity region of the BEC-BC is the convex hull of the closure of all Ru) 
satisfying 

R, < I{X;Y^\U) 

R12 < HU-Y^), (37) 

for some joint distribution p(u)p(x|u)p(2/i, y2\x). Since the cardinality of the random variable U is 
bounded by \U\ < minjlA"!, |3^i|, |3^2|} = 2 [1, p. 422] and the channel is symmetric with respect 
to the alphabet and 1, we can take p(M)~Bemoulli(l/2) and p{x\u) as the transition probability 
of a binary symmetric channel with crossover probability p. This stochastically degraded BEC- 
BC together with the auxiliary random variable U is illustrated in Fig. |71 
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The capacity region (|37l) is evaluated to be 

Ri < (l-ai)%) 

Ri2 < (l-a2)[l-%)], (38) 

where h{p) = —plogp — {1 — p) log(l — p) is the binary entropy function. Assuming the two 
ergodic components are equally probable in the composite channel, the achievable expected rate 
using a broadcast code is then 

R = sup{i?i2 + i?i/2} 
p 

= max < 1 — a2, 



Appendix E 

Proof of Upper Bound for Expected Capacity 

Denote by X^{1), X;(2"^-) and D,{1), D,(2"-f^0 the set of codewords and decoding 
regions corresponding to channel s. We fix 7 > and define for each s E S and 1 < i < 2"^^ 

Bs{t) = {r"e3^":-2x"H'"(X"(z);F"is)<i?,-7} 

n 

= {F" G : Px„|yn,s(X"(z)|y", s) < 2-"n (39) 
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where (|391 ) follows from Q. Notice that for any s with Rg > 



J^X"Y"\S 

2nRs 



-ix^wAX'';Y''\s) <Rs-i 



n 



i=l 

+ Yl Px-Y^\s{X^{^),y''\s) 

i=l y"eBs{i)nDs{i) 



(40) 



Furthermore we have 



E5 lim inf Px"Y" 



\s 



-ix^wAX'';Y''\S)<Rs-i 



n 



S 



< lim KsPx"Y"\s 

n— »oo 

< lim [E^Pi"'^) + 2 



<P5-7 



S 



-"71 



0, 



where the chain of inequalities follows from Fatou's lemma, (1401) . and the code constraint 

" 1 



Since the probability must be non-negative, we conclude 



lim inf Px"y" 



n 



<P5-7 



S 







almost surely (a.s.) in S. Thus for any e > 0, 



Px'^Y"\S 



n 



S 



< e 



occurs infinitely often a.s. Assuming |2x"VK"(-^"; ^"I'S') | is bounded by M, we then have 



E 



X^Y^IS 



—ix"W"{X^', Y'^\S) 



n 



S 



> {Rs - -f){l - e) - eM 



also occurs infinitely often a.s. Since e is arbitrary, we see that 

1 



EcE 



S'^X"Y"\S 



n 



S 



> EsRs - 7 



occurs infinitely often for arbitrary 7, which gives us the upper bound (1181) for expected capacity. 
Note that the expectation in the upper bound (fTSi) is indeed ^/(X"; V^IS), so the upper bound 
can also be proved using the standard technique of Fano's inequality. 
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